perm filename DRAFT[AIM,DBL]1 blob sn#124699 filedate 1974-10-17 generic text, type T, neo UTF8
00100	.DEVICE XGP
00200	
00300	.FONT 1 "FIX25"
00400	.FONT 2 "SIGN57"
00500	.FONT 3 "SHD40"
00600	.FONT 4  "BDI25"
00700	.FONT 5  "NGR20"
00800	.TURN ON "↓_π{"
00900	.TURN ON "⊗" FOR "%"
01000	.MACRO B ⊂ BEGIN VERBATIM GROUP ⊃
01100	.MACRO E ⊂ APART END ⊃
01200	.TABBREAK
01300	.COMPACT
01400	.SELECT 1
     

00100	.PORTION TITLEPAGE
00200	.BEGIN CENTER 
00300	.RETAIN
00400	.PAGE FRAME 54 HIGH 91 WIDE
00500	.EVENLEFTBORDER←ODDLEFTBORDER←1000;
00600	.FONT 6 "SGN114"
00700	⊗6   BEINGS:⊗*
00800	
00900	⊗2REPRESENTATION OF KNOWLEDGE
01000	AS INTERACTING MODULES
01100	⊗*
01200	
01300	⊗4Applied to Automatic Program Synthesis⊗*
01400	.END
01500	.GROUP SKIP 10
01600	Fourth Draft:  NOT FOR DISTRIBUTION
01700	.GROUP SKIP 10
01800	⊗3DOUG LENAT 
01900	
02000	STANFORD UNIVERSITY
02100	
02200	ARTIFICIAL INTELLIGENCE LABORATORY⊗*
02300	
02400	.PORTION CONTENTS
02500	.GROUP SKIP 10
02600	⊗2TABLE OF CONTENTS⊗*
02700	.GROUP SKIP 10
02800	.SELECT 1
02900	.B
03000	    1.	Abstract  . . . . . . . . . . . . . . . .  1
03100	    2.	Introduction  . . . . . . . . . . . . . .  2
03200	    3.	Theory:   Ideas . . . . . . . . . . . . .  3
03300	    4.	Reality:  Examples  . . . . . . . . . . .  9
03400	    5.	Theory:   Design. . . . . . . . . . . . . 14
03500	    6.	Reality:  Examples  . . . . . . . . . . . 18
03600	    7.	Reality:  Results . . . . . . . . . . . . 21
03700	    8.	Theory:   Conclusions . . . . . . . . . . 25
03800	    9.	Acknowledgements  . . . . . . . . . . . . 29
03900		Appendix 1: BEING parts . . . . . . . . . A1.1
04000		Appendix 2: the BEINGs  . . . . . . . . . A2.1
04100		Appendix 3: BEING usage . . . . . . . . . A3.1
04200		Appendix 4: CF  program . . . . . . . . . A4.1
04300		Appendix 5: the dialogue creating CF  . . A5.1
04400		Appendix 6: trial running of CF . . . . . A6.1
04500		Bibliography  . . . . . . . . . . . . . . BIB.1
04600	.E
04700	.SELECT 1
     

00100	.PORTION ABSTRACT
00200	.PAGE FRAME 54 HIGH 74 WIDE
00300	.EVERY HEADING(⊗3BEINGS⊗*,,⊗4Doug Lenat⊗*)
00400	.EVERY FOOTING(⊗5Fourth Draft .... {DATE},page {IF PAGE = 1 THEN 1 ELSE PAGE})
00500	.COUNT PAGE PRINTING "1"
00600	.NEXT PAGE
00700	.GROUP SKIP 10
00800	
00900	⊗21. ABSTRACT⊗*
01000	.GROUP SKIP 10
01100	
01200	Knowledge may be organized as  a  community  of  interacting  modules
01300	(e.g.,  ACTORS [Hewitt]), in which control also resides.  By granting
01400	each module a complex structure, yet constraining that that structure
01500	be   standard   over  all  the  modules,  some  new  theoretical  and
01600	experimental results were found.  The domain  of  this  research  was
01700	heuristic  automatic  synthesis of inductive inference LISP programs.
01800	Since these modules, called BEINGs, are the only  entities  permitted
01900	to  exist  theoretically, the generated code must also be a community
02000	of BEINGs.  Like the original pool,  they  must  be  able  to  answer
02100	questions  about  themselves as they run.  An experimental system was
02200	partially implemented.  It  synthesized  a  few  programs  from  very
02300	restricted dialogues.  Some unexpected problems were encountered, and
02400	some aspects which were considered ignorable seem  not  to  be.   The
02500	performance  of  the  system  is discussed, and a preliminary view of
02600	BEINGs assessed.
     

00100	.PORTION THESIS
00200	⊗22. INTRODUCTION⊗*
00300	
00400		This  paper  reports  on a scheme for representing knowledge,
00500	partially  realized  in  an  experimental  system  (PUP6)  aimed   at
00600	generating  LISP  programs  from dialogues with the user. The methods
00700	employed are not formal, but rather involve structuring of  knowledge
00800	about  programming,  about  the  task  domain,  and about transfer of
00900	control.  To date, PUP6 has synthesized three significantly different
01000	target programs:  a concept formation program (similar to [Winston]),
01100	a grammatical inference program, and a simple information storage and
01200	retrieval  system  (referred  to  as, respectively, CF, GI, and INF).
01300	Specification is via rigid,
01400	extended dialogues carried on with the user,  in
01500	a  small  subset  of  English,  in  which  the task is delineated and
01600	questionable details are discussed.  The  specification  is  partial,
01700	and  the system makes assumptions continually. This is what is meant,
01800	in the sequel, by ⊗4Automatic Programming.  Target program⊗* will refer
01900	to code which has been successfully generated by such a system.
02000	⊗4Dialogue⊗* is the interactive communication between the user and the
02100	automatic programming system, which results in target code synthesis.
02200		The  CF  target  program  was cleanly specified, an annotated
02300	protocol was done, and the BEINGs needed to reproduce  the  reasoning
02400	present  in  that protocol were fasioned. Additions were made to PUP6
02500	before it could synthesize GI or INF.
02600		The  main  successes  of the experiment were that the desired
02700	reasoning steps in the original protocol
02800	were actually simulated, most of the BEINGs were used
02900	in  writing  more  than one of the programs, and the synthesized code
03000	could be interrupted and asked a few kinds of  questions.   The  main
03100	defects  were  the  inflexibility of the system to new dialogues, the
03200	inability for  the  user  to  add  new,  high-level,  domain-specific
03300	knowledge,  and  the verbosity of the system.  Some of these problems
03400	may arise from the annotated protocol technique employed to  get  the
03500	BEINGs  initially,  and  not  from  anything  inherent  to the BEINGs
03600	representation.
03700		Our  treatment will follow these lines:  First, the ideas are
03800	presented, including a  discussion  of  BEINGs.   Examples  are  then
03900	provided to illustrate these concepts. The next section describes the
04000	implementation, and explains the choice of targets.  Only  then  will
04100	performance  be  evaluated.  From  the experience with PUP6 are drawn
04200	several preliminary conclusions about both the utility of the  BEINGs
04300	representation and about problems relevant to Automatic Programming.
04400		The  appendices  present  further   details,   samples,   and
04500	examples.   First  comes  a  description  of  each BEING part, then a
04600	summary of the knowledge contained in each BEING.  Appendix  3  is  a
04700	table  of data recording how the BEING community interacts. The final
04800	appendices  present  excerpts  from  the  CF   program,   from   PUP6
04900	synthesizing CF, and from CF itself running.
     

00100	.NEXT PAGE
00200	.EVERY FOOTING(⊗5Fourth Draft .... {DATE},page {PAGE},THEORY:  IDEAS)
00300	⊗23. THEORY:  IDEAS⊗*
00400		How should knowledge be represented?  Many  researchers  feel
00500	that a simple, uniform formalism should be used for all facts; others
00600	disagree, claiming that complexity of  behavior  both  justifies  and
00700	demands complexity of representation.  The BEINGs concept may resolve
00800	this uniformity vs.  structure controversy; at the  least,  it  is  a
00900	compromise.  One  must  recognize  the  advantages  of each side, and
01000	refuse to give them up. The  benefits  of  the  former  include  easy
01100	addition  of  knowledge  [Newell]  and  simple, aesthetic methods for
01200	combining information [McCarthy]. Structure,  however,  is  necessary
01300	for (⊗4combinatorially⊗*) efficient handling of large amounts of data
01400	[MIT].  PUP6 integrates these two ideas into the concept of BEINGs. A
01500	BEING  is  a  collection  of  about  thirty  little bits of INTERLISP
01600	[Teitelman] code; the answers to thirty questions  about  the  BEING.
01700	That  is,  a  BEING  is  a small, loosely-knit LISP program, which is
01800	considered equivalent to its answers to these fixed questions.  Every
01900	scrap  of knowledge, every bit of control structure should be encoded
02000	into BEINGs. There should be nothing else  in  the  system  but  this
02100	interacting community.
02200		PUP6  embodies  only  some  of  the  ideas  and   constraints
02300	discussed  in  this section.  For example, economy demanded some very
02400	fast auxilliary functions, demons, and pure  data  structures.  These
02500	reduced  the  computation  time  by a constant factor (about ten), by
02600	saving on the overhead implicit in each call to a BEING; they did not
02700	therefore  reduce the ⊗4combinatorial⊗* effort involved. This will be
02800	explained in the REALITY section, along with any other deviations  of
02900	PUP6 from an ideal BEINGs community.
03000		Notice that while  each  BEING  is  highly  structured,  this
03100	structure is standard over the entire pool. Each BEING part is itself
03200	a little BEING, etc.,  and  this  infinite  regress  stops  when  the
03300	contents  of  a BEING part becomes sufficiently primitive. As will be
03400	discussed later, the BEINGs mimic a  human  programmer;  whatever  he
03500	considers  primitive  when  writing  the  program,BEINGs may consider
03600	primitive.  Typically this level is about the same as  the  INTERLISP
03700	[Teitelman]  language:     a primitive is a single INTERLISP function
03800	call, or a  few  simple  ones  connected  trivially.  Each  BEING  is
03900	cognizant  of  the  set  of  thirty  questions,  in the sense that in
04000	answering one of them it may freely ask  questions  of  other  BEINGs
04100	(often  through  nondeterministic  goal  statements.)  A  few  of the
04200	BEING-PARTS might be:     what is your basic idea and  purpose,  what
04300	effects  do  you have, how do you go about causing them, what must be
04400	ensured before you begin, how  complicated  are  you,  what  is  your
04500	chance  of  success, are  you recursive, whom else might you transfer
04600	control to, what alternatives to you exist, what BEINGs are a  little
04700	more/less  general than you are, do you evaluate your arguments, what
04800	is the nature of the value you return, why do you exist, why  do  you
04900	want  to  be  in  control  now, etc. The reader may wish to glance at
05000	Appendix 1, to  see  the  particular  set  of  questions  which  were
05100	actually  settled  upon.   When  he  feels  the  urge  for a concrete
05200	example, he should glance over pages 8-11 and Appendices 4 and 5.
05300		The BEINGs control themselves in a simple way. A very general
05400	BEING, SERVE_THE_USER, has control initially. In general, some  BEING
05500	⊗4B⊗*  will  be  in  control.   Its BEING parts are assembled into an
05600	executable function, and it is run.  During the course of its  reign,
05700	⊗4B⊗*  will  want things done and/or tested which it cannot manage by
05800	itself.  This corresponds to when a normal program needs  to  call  a
05900	subroutine.  What ⊗4B⊗* does is to call on SATISFY by name, supplying
06000	a description of what is wanted.  SATISFY assembles itself, asks  the
06100	entire  BEING  pool  "who can do this thing?", and collects a list of
06200	the reponders.  SATISFY then calls  CHOOSE_FROM  by  name,  supplying
06300	this  list  and  the original request. Each responder is asked why he
06400	should be allowed to go now (the WHEN part) and how costly he will be
06500	if he does go (the COMPLEXITY part.) The best responder BEING is then
06600	passed control. If  he  succeeds  in  his  mission,  SATISFY  returns
06700	control to ⊗4B⊗*. Otherwise the remaining responders are compared and
06800	a new one is tried. If they all fail, then SATISFY tells  ⊗4B⊗*  that
06900	it has failed. ⊗4B⊗* then decides whether to try something else or to
07000	fail itself. In addition to these goal  statements,  there  exists  a
07100	"world" consisting of assertions and variables with values. These are
07200	regarded as trivial cases of BEINGs, possessing only  one  part.   So
07300	⊗4B⊗*  may also query this data base by saying "what assertions match
07400	this one...", or by asking "what is the value of...".  These  actions
07500	can be programmed in a much more efficient manner than as BEINGs.
07600		Theory would indicate that BEINGs must be assembled by  other
07700	BEINGs dynamically. In practice, the way a BEING's parts fit together
07800	is uniform over all the BEINGs at all times. Thus one simple function
07900	(or  assembled BEING) can assemble all the BEINGs initially into LISP
08000	functions.  The precise algorithm for doing this is discussed in  the
08100	next section.
08200		BEINGs are the only  entities  permitted  (theoretically)  to
08300	exist in our system; ergo all our code must be written as BEINGs, and
08400	must be written by BEINGs.
08500		An  obvious but crucial consequence is that ⊗4some⊗* BEING(s)
08600	must write new BEINGs.  The idea is that the BEING  who  knows  about
08700	⊗4X⊗*  takes  charge  of  generating  BEINGs  relating to ⊗4X⊗*.  For
08800	example,  the  INSERT  BEING  can  do  inserting  by  one  of  a  few
08900	algorithms,  and  he  can  also  write (by specializing himself) more
09000	specific  insert  routines,  and  he  can  answer   questions   about
09100	inserting. This idea is analogous to any reliance upon experts (e.g.,
09200	[Feigenbaum]), and also reemphasizes the theme of any modular
09300	representation of knowledge.
09400		A second consequence is that the output code  will  have  all
09500	the ⊗4types⊗* of "intelligence" that any BEING community has:      it
09600	can write variations of itself, modify itself according to the user's
09700	complaints,  and  answer questions about what it is doing as it runs.
09800	The specialized code cannot, of course, write the full complement  of
09900	programs the original system could write.
10000		In a similar vein, ⊗4some⊗* BEING(s) must do the  translation
10100	of  the  users' quasi-English inputs into BEING-usable form. ↓_Each_↓
10200	BEING ⊗4X⊗* is charged with recognizing English related to  ⊗4X⊗*.
10300	Thus  translation
10400	consists  of  querying  "who  can  recognize  ..."  and waiting for a
10500	response. For example, our  INSERT  BEING  must  also  recognize  and
10600	process phrases involving inserting, stack-pushing, and merging.
10700		A result of this treatment of natural language
10800	processing is that any phrase which can be translated can be acted
10900	upon, and any phrase which can't be translated probably ⊗4can't⊗*
11000	be acted upon (even if it is rephrased).  Since patterns are used,
11100	if the global structure of the sentence is recognized, then the user
11200	need only be asked to clear up the difficult sub-parts.
11300		Three constraints are present which reflect new biasses  more
11400	than  new  insights:       one need not feel any reverence toward the
11500	anthropomorphic paradigm of programming (ignore details,  catch  them
11600	later by debugging).  Programming decisions continually crop up which
11700	the system is not able to resolve  at  the  time. The  BEINGs  system
11800	should  always  spend  a significant effort deferring the decision as
11900	long as possible.  When, at last, no progress can be made without its
12000	resolution,  and  if the system is still unsure, then either the user
12100	settles the question or else a backtrack point is reluctantly set up.
12200	But  often,  by  this  time,  some  new  information is present which
12300	enables the system to resolve the decision, thus reducing the  amount
12400	of  backtracking.   If  there  are two or more decisions which can no
12500	longer be deferred, the system tackles first the one estimated to  be
12600	the easiest (Occam's razor).
12700		The final bit of philosophy is that most of the  carelessness
12800	bugs  can be eliminated through this deferral, feed-forward, and very
12900	precise  record-keeping.  Humans  depend  on  their  adaptability  to
13000	compensate  for limitations in their brain hardware (long write times
13100	for long-term memory and  external  memory,  and  limited  short-term
13200	memory size force us to ignore details; see [Newell]) but there is no
13300	need for an ⊗4automatic⊗* programming system to do so.  For  example,
13400	when  a  list  structure  is  first  encountered,  the system records
13500	warnings that it is undefined, no accesses, insertions, or  deletions
13600	have  been  observed,  and  so  on.  Each such worry is weighted, and
13700	periodically the system resolves such  warning  notes  --  especially
13800	near the completion of the target program.
13900		The BEINGs representation  is  suitable  for  simulating  any
14000	complex  task ⊗4T⊗* involving frequent small interventions by various
14100	experts. In addition to writing programs, this activity could  be  as
14200	diverse  as  playing  chess,  running a business, simulating physical
14300	interactions,  and  playing  volleyball.  For  the  latter  task,   a
14400	BEINGs-based system might create a specialized BEING for each player,
14500	perhaps with a complexity vector indicating his abilities,  reflexes,
14600	etc.  The  BEING  in  control would be the player hitting the ball. A
14700	specialized  Choose-from  BEING   would   decide   who   goes   next,
14800	occasionally interpreting a tie between BEINGs vying for control as a
14900	collision on the court.
15000		There  is  one  particular  bias  of  BEINGs  toward  writing
15100	high-level programs, over other activities. The  new  BEINGs  created
15200	during  the  execution of a specified task are kept distinct from the
15300	original set  of  BEINGs.  Then  a  ⊗4program⊗*  for  task  ⊗4T⊗*  is
15400	accomplished  by doing ⊗4T⊗* and then dumping the new BEINGs out onto
15500	a new file.  The entire old  BEINGs  pool  is  then  treated  as  the
15600	language  supporting this file. As a corollary, asking a BEINGs-based
15700	system to write any subset of itself is the null task!
15800		Most  programmers  intentionally augment their code to aid in
15900	later debugging, modification, and readability by humans.  These aids
16000	are  typically  comments,  summaries,  over-generalized  subroutines,
16100	optional print statements,  and  runtime  statistics.  The
16200	structure  of  the  program  itself is also  recognized as a powerful
16300	manipulable element.  The length of any program written as a pool  of
16400	BEINGs  is several times as long as a clean hand-coded version.  This
16500	extra code, the parts of each new BEING which are superfluous, may be
16600	viewed as well-organized self-documentation.  They should improve the
16700	debugging, expansion, and  accessibility  (to  alien  users)  of  the
16800	synthesized code.
16900		Some assertions are so meaningful  to  the  user,
17000	that they should be stored in the code itself.  At any time, the user
17100	may look at a piece of code; the comments present  ⊗4then⊗* are  all
17200	assertions pertaining to that piece of code at that time.
17300	BEINGS may use comments at  several  different  levels.  At  the
17400	lowest  level,  each  comment  is  merely  a  unique token; it may be
17500	inserted, removed, or searched  for.   Slightly  better,  the  BEINGs
17600	system can ask "is there a comment involving ...". Still higher level
17700	usage of comments would occur if a BEING looked at a comment  totally
17800	ignorant of it, and TRANSLATEd it into something meaningful. 
17900		When imagining an ideal AP  (automatic  programming)  system,
18000	one  ability  we  might  wish  for  is  that  it  be  able to input a
18100	well-documented program, glean pure programming knowledge from it,
18200	and  transform  this  into  a  format  it  can use.  Also, to improve
18300	itself, it should be capable of digesting a sample, valid AP  dialog,
18400	and (perhaps through additional user interaction) modify itself so it
18500	could actually carry on that  dialog.  The  BEINGs  idea  may  be  an
18600	insufficient framework for achieving this.
18700		BEINGs are now contrasted against  the  concepts  of  ACTORS,
18800	FRAMES, HACKER, formal AP systems, and macro expansion schemes.
18900		ACTORS [Hewitt]  provided  the  key  concept  of  integrating
19000	uniformity  of  construction  with  sophistocation of behavior. There
19100	are, however, many things one wants to know about  a  routine,  other
19200	than  its  value  and its control successor. The structure isn't rich
19300	enough to allow ACTORS to communicate in a ⊗4descriptive⊗* way. There
19400	seems to be an infinite regress in a theoretical community of ACTORS,
19500	just as with BEINGs.  Hewitt suggests stopping  when  his  ACTOR-part
19600	(the value returned) becomes constant.  The number of BEING-parts and
19700	the high level of each's activity precludes using this  criterion;  a
19800	much  less  rigorous  judgement  of  primitivity  was  used in PUP6's
19900	BEINGs.
20000	
20100		FRAMES [Minsky] seems superficially similar  to  BEINGs,  but
20200	there  is a deep difference: 
20300	FRAMES intentionally have default values already
20400	filled in  for each frame, and replaced only when  necessary.
20500	This  is  akin  to  automatic  programming by blind debugging, but is
20600	antithetical to the fastidious bookkeeping BEINGs philosophy.  As  an
20700	example,   consider   the   writing   of  a  short,  recursive,  list
20800	manipulation LISP program (reverse, flatten, assoc, alternate,  etc.)
20900	A human, consulting the proper frame in his head, knows to ignore the
21000	base step for a while, then fill it in, usually  as  ⊗4if  NIL,  then
21100	NIL⊗*, or, for numerical functions, ⊗4if NIL, then 0⊗* or ⊗41⊗*.  The
21200	human makes this default synthesis, tries out the program, and  looks
21300	closer  (to fill in this "slot" properly) only if something calls his
21400	attention to it (such as a LISP error message.) A BEINGs system would
21500	also  defer working on the base step, but could -- and would -- place
21600	a note about that deferral into the assertional warning base.  Before
21700	thinking  it  was  finished,  the  system  would  attend to that note
21800	carefully. A word in defense of FRAMES:   they  are  meant  to  be  a
21900	construct  to  model  perception,  and  therefore the default concept
22000	makes sense for them. BEINGs are clearly non-anthropomorphic in their
22100	internal  workings,  and  would  be very unlikely to occur as a human
22200	perceptual mechanism.
22300		HACKER [Sussman] differs from BEINGs in the same  qualitative
22400	way.  The orientation of HACKER is to put together pieces of programs
22500	into something close to the final desired target, then  try  and  run
22600	it.  When  errors result, they are analyzed and then patched.  BEINGs
22700	is oriented toward writing bug-free code; what it writes must  always
22800	coincide  with its internal specifications of that code. The user may
22900	later change his mind, and a BEINGs system must be able to modify its
23000	own  programs,  but  this  process  is much better defined than blind
23100	debugging. On the other  hand,  if  a  BEINGs  system  does  have  an
23200	unexpected bug in it, it may not be able to recover from it. One
23300	way to overcome this would be to include special recover-from-bugs
23400	BEINGs.
23500		The  formal  manipulation  systems which also synthesize code
23600	are so different from BEINGs, that it is difficult to  compare  them.
23700	These  logical systems (e.g., [Luckham]) attack an entirely different
23800	problem.  Their task is specified rigorously  in  advance,  and  they
23900	proceed  formally  to  search  for  a  solution  program.  BEINGs are
24000	designed to work on a much higher level, heuristically.  Each  action
24100	is  implicitly justified by the fact that the user can later describe
24200	to the system
24300	what is wrong with the program it's generated. BEINGs  should
24400	thus  be  tolerant of user ambiguity and error.  Also, BEINGs are not
24500	limited to automatic programming.
24600		Like  ⊗4any⊗* AP system which writes structured programs, the
24700	external behavior  of  a  BEINGs  system  applied  to  this  task  is
24800	reminiscent  of  macro  expansion regardless of ⊗4what⊗* the internal
24900	BEING  interactions  are.  One  major  distinction  between  the  two
25000	concepts is when and how the macros are expanded. Traditional answers
25100	are:  at  compile  time,  by  rigid   substitutions   (although   not
25200	necessarily  context-free.)  BEINGs  systems expand their "macros" at
25300	times they choose, using knowledge gleaned from each other  and  from
25400	the  user.   One  could  consider  such  a  system  as  a smart macro
25500	expansion scheme.
     

00100	.NEXT PAGE
00200	.EVERY FOOTING(⊗5Fourth Draft .... {DATE},page {PAGE},REALITY:  EXAMPLES)
00300	⊗24. REALITY:  EXAMPLES⊗*
00400		This  section  presents  examples  of  the following:       a
00500	programming knowledge BEING, an explanation of what  happens  when  a
00600	BEING  is called, a concept formation knowledge BEING, a descriptions
00700	of a piece of the user-PUP6  dialogue,  and  some  justification  for
00800	using functions, call-by-name, demons, and assertions. Although these
00900	examples are meant to clarify  the  preceding  section's  theoretical
01000	ideas, they are all drawn from the actual PUP6 system.
01100		4.1.   OBTAIN_USABLE_INFORMATION  is  a  typical  high-level,
01200	domain-independent  BEING.    The  WHEN  part  consists  of predicate
01300	"feelers"  which  sample  the  world  and  report   their   qualities
01400	numerically.   A reason is supplied with each feeler:
01500	an English sentence stored for the benefit of inquisitive users.
01600	The numbers are
01700	then simply added to decide how a propos the  BEING  is  at  present.
01800	This  scheme  is  adequate  but  undesirable;  one would like them to
01900	discuss descriptions of what they encounter; but  the  WHEN  part  is
02000	used  only  to  break  ties  between BEINGs vying for control, and it
02100	usually doesn't matter what order they go in.  Thus  a  simple,  fast
02200	method  of  selection suffices.  This particular BEING (whose parts
02300	are exhibited on pages 10 and
02400	11) has feelers which respond that it is ⊗4always⊗* an  undesirable
02500	(-10)  thing  to  do, but ⊗4may⊗* be indicated (+111) if there exists
02600	new but not yet usable information. These numbers, like all the parts
02700	of all the BEINGs initally in PUP6, were decided upon and inserted by
02800	hand.
02900		The   IDEN   parts   are  collected  together  into  a  large
03000	translation table.  When  a  form  ⊗4LI⊗*  must  be  translated,  the
03100	TRANSLATE  BEING goes through this table, asking each IDEN part if it
03200	claims to recognize ⊗4LI⊗*. Each IDEN runs its  own  little  program,
03300	typically  some  type of pattern match involving ⊗4LI⊗* and the state
03400	of the world, to answer this question.  Some measure of  goodness  of
03500	match  is  also  reported.   If  there  is  more  than one responder,
03600	CHOOSE-FROM picks the one with the highest priority of  match.    The
03700	winner runs a little program which ultimately returns a BEING-call or
03800	a constant as the translated value of ⊗4LI⊗*. This program might call
03900	other  BEINGs,  often  calls  TRANSLATE  several times recursively on
04000	parts   of   ⊗4LI⊗*.     Now   examine   the   IDEN   part   of   the
04100	OBTAIN_USABLE_INFORMATION  BEING  (next page).
04200	There are just two classes of phrases that this BEING can recognize.
04300	If two conditions are met,
04400	it says to ⊗4call⊗* OBTAIN-USABLE-INFORMATION with, as argument,  the
04500	result of calling TRANSLATE on the second element of ⊗4LI⊗*.
04600	If some BEING or user wants to find out more about anything, then 
04700	this BEING can be called with that thing as argument.
04800		The EFFECTS parts of each BEING are similarly collected  into
04900	one  table  to  facilitate  asking all BEINGs simultaneously "Can you
05000	cause X to occur?" Each EFFECTS part examines X and  the  world,  and
05100	either  replies No, or else returns a little program (involving calls
05200	and constants) which  should  produce  the  effect,  with  a  certain
05300	probability.   This program will probably involve a call to the BEING
05400	itself. The BEING below shows that it should be called to acheive the
05500	existence of new, usable information (see the MAIN:EFFECTS part, page
05600	11).
05700		The WHAT, HOW, and  WHY  parts  are  mainly  for  the  user's
05800	benefit.   When  a  choice  between  BEINGs  must  be made, the WHEN,
05900	AFFECTS,  and  COMPLEXITY-VECTOR  parts  are  queried.    If  a  new,
06000	manipulated   version   of   the   BEING   must   be   created,   the
06100	SPECIALIZATIONS,   ENCODABLE,    DATA-STRUCTURE,    PREDICATE,    and
06200	FORM-CHANGING  parts  might  be  relevant.   If the BEING fails, some
06300	BEING speicified in the ALTERNATIVES or GENERALIZATIONS parts might
06400	be tried.
06500		In the current case, the COMPLEXITY-VECTOR says that it is of
06600	average difficulty to call, its descendants may (.5 chance)  call  it
06700	back,  it has an average chance of success and cost of attempting it,
06800	and there is no (.1) good reason to defer it any longer.
06900		The  AFFECTS  part  of the OBTAIN_USABLE_INFORMATION BEING is
07000	clear. One BEING is definitely called, and four others might be. 
07100		The  absence  of  some parts, like DATA_STRUCTURE, PREDICATE,
07200	and NLAMBDA, indicate that these qualities don't apply.  The  absence
07300	of  other  parts,  like  SPECIALIZATIONS and ALTERNATIVES, indicate a
07400	lack of thoroughness in filling out unneeded BEING parts.
07500		The   META-CODE   has  us  choose  from  the  following  four
07600	alternatives,    each    of    which    is    itself     a     BEING:
07700	Get_Brand_New_Information  (in  English,  from  the  user), Translate
07800	something    (transform     from     English     to     BEING-calls),
07900	Analyze_The_Implications  (of some existing, translated information),
08000	Extract_a_Relevant_Subset   (of   the   existing   information)    to
08100	concentrate upon.
08200	Calls are nondeterministic only when the BEING doesn't know exactly
08300	which BEING to call. If the ⊗4set⊗* of possible choices is known, as
08400	in this case, then CHOOSE-FROM is called with the choice set explicit.
08500	If even this isn't known, then CHOOSE-FROM must query the EFFECTS
08600	parts of all the BEINGs in the system.
08700		Below are exhibited the actual representation of  this  BEING
08800	in  PUP6,  and  the  function  which  would  be  executed  if it were
08900	⊗4called⊗*.
09000	
09100	.SELECT 5
09200	.BEGIN NOFILL
09300	
09400	
09500	⊗4↓_PART_↓    ↓_actual value of the part for OBTAIN:USABLE:INFORMATION_↓ ⊗*
09600	
09700	IDEN (((AND (EQUAL (CAR LI)
09800		    	OBTAIN:USABLE:INFORMATION)
09900		  (EQUAL (LENGTH LI)
10000			 2))
10100	       (OBTAIN:USABLE:INFORMATION (TRANSLATE (CADR LI))))
10200	      ((MATCH ( FIND OUT MORE ABOUT ANY1)   LI)
10300	       (OBTAIN:USABLE:INFORMATION ANY1)))
10400	BEING T 
10500	EXPLICIT:ARGS (U) 
10600	WHAT (TUPLE OBTAIN SOME INFORMATION WHICH CAN BE USED) 
10700	HOW (TUPLE OBTAIN NEW FACTS ABOUT OLD INFORMATION, OR OBTAIN TOTALLY
10800	     NEW INFORMATION) 
10900	WHY (TUPLE PUP HAS NO MORE INFORMATION THAT IT CAN USE TO PROGRESS) 
11000	WHEN ((T -10 (TUPLE SINCE THIS IS AN EXPONENTIALLY-GROWING, BAD THING
11100		      TO DO IN GENERAL))
11200	      (NEW:INFO:LIST 111
11300	             (QUOTE (BECAUSE WE SHOULD WORK ON UNASSIMILATED NEW 
11400	                      INFORMATION IF THERE IS ANY)))) 
11500	META:CODE (PROGN (SETQ BECAUSE
11600			   (QUOTE (WE CAN ONLY TRY TO OBTAIN USABLE 
11700				    INFORMATION IN ONE WAY AT A TIME)))
11800			 (CHOOSE:FROM (QUOTE ((TRANSLATE U)
11900	 				      (GET:NEW:INFORMATION U)
12000					      (ANALYZE:IMPLICATIONS U)
12100					      (EXTRACT:RELEVANT:SUBSET U))))) 
12200	EXPLICIT:ARGS:CHECK T 
12300	MAIN:EFFECTS (((NEW INFO ANY1)
12400	               (OBTAIN:USABLE:INFORMATION (@ ANY1)))
12500	              ((USABLE INFO ANY1)
12600		       (OBTAIN:USABLE:INFORMATION (@ ANY1)))) 
12700	AFFECTS (QUOTE ((CHOOSE:FROM CALLED)
12800		        (TRANSLATE POSSIBLE:CALLED)
12900	 	        (GET:NEW:INFORMATION POSSIBLE:CALLED)
13000		        (ANALYZE:IMPLICATIONS POSSIBLE:CALLED)
13100		        (EXTRACT:RELEVANT:SUBSET POSSIBLE:CALLED))) 
13200	COMPLEXITY:VECTOR (.5 .5 .5 .5 .1) 
13300	
13400	.SELECT 1
13500		4.2. When a BEING is ⊗4called⊗*, its parts are assembled into a
13600	function which is then executed. Here is the ⊗4FUNCTIONAL⊗* form of the
13700	OBTAIN:USABLE:INFORMATION   BEING:
13800	.SELECT 5
13900	
14000	(OBTAIN:USABLE:INFORMATION
14100	  (LAMBDA (U FN:VALUE FINAL:CO:REQ)
14200	      (PROG1 
14300	          (AND 
14400	              (SETQ BEING:STACK (CONS OBTAIN:USABLE:INFORMATION BEING:STACK))
14500	              (PUT OBTAIN:USABLE:INFORMATION SPEC:WHY BECAUSE)
14600	              (EVERY (APPEND CURRENT:DEMONS USER:INTERRUPT:DEMONS)
14700	                     (QUOTE APPLY*))
14800	              (SETQ BECAUSE
14900	             	   (QUOTE (WE CAN ONLY TRY TO OBTAIN USABLE 
15000	                              INFORMATION IN ONE WAY AT A TIME)))
15100	              (CHOOSE:FROM (QUOTE ((TRANSLATE U)
15200	           		           (GET:NEW:INFORMATION U)
15300				           (ANALYZE:IMPLICATIONS U)
15400				           (EXTRACT:RELEVANT:SUBSET U)))))
15500	          (SETQ BEING:STACK (CDR BEING:STACK)))))
15600	.END
15700	.SELECT 1
15800	
15900		The  process of assembling the BEING parts into a function is
16000	straight-forward.  First, the explicit arguments (those global to the
16100	BEING)  are  bound. The implicit arguments (those local to the BEING,
16200	like PROG variables) are initialized. The name of the BEING is pushed
16300	onto  the  BEING  control  stack  (pointing  to  its caller), and any
16400	newly-activated  demons  are  pushed  onto  the  demon  stack.   The
16500	ARGS-CHECK  predicates  are  evaluated.  Then PUP6 works to make each
16600	PRE-REQUISITE true in the world.  Each COMMENT is evaluated, then the
16700	CO-REQUISITES,  META-CODE,  and  current  demons  all are executed in
16800	pseudo-parallel.  Each POST-REQUISITE is then satisfied.  Since these
16900	activities are all embedded in an AND, any of them may cause the
17000	entire BEING to halt and fail, simply by returning NIL.
17100	In both cases, just  before  exiting,  the  demon
17200	stack is popped and the BEING stack is updated (usually just popped),
17300	and control passes to the delegated BEING (usually the one who called
17400	this  BEING,  at  the  state  when  he  called  it.)  A fancy context
17500	mechanism was available but not actually needed.
17600		The function which assembled all the BEINGs exploited the
17700	fact that it operated only at  system load time. Thus if the BEING 
17800	had no new DEMONs to activate, all the corresponding demon-stack
17900	manipulations could be omitted. Optimizations like these are clear
18000	from comparing the functional form and the description of how it
18100	should be created (see above).
18200	Another example in this BEING is that no CO-REQUISITES 
18300	occur, so no parallel processing need be simulated. 
18400		4.2 PARTITION_A_DOMAIN is a high-level, domain-specific BEING
18500	whose basic idea is to divide up a space into  categories.  Only  two
18600	BEING   parts   are   filled   in   here   which   were  absent  from
18700	OBTAIN_USABLE_INFORMATION; namely, SPECIALIZATIONS  and  DEMONS.  The
18800	SPECIALIZATIONS part says that to write a specific version of itself,
18900	a few decisions must be made. The results  will  then  indicate  what
19000	pieces  of  code  get  assembled together to form the new BEING.  The
19100	partition may be only partial (an element of the domain belongs to no
19200	class  of  the partition), only weak (an element belongs to more than
19300	one class), and, most importantly, the specialized BEING should  work
19400	by  repeatedly doing some of the following three activities: accept a
19500	class-name from the user and guess some of its elements, accept an 
19600	element from the user and try to guess which class it belongs to,
19700	or accept an <element, class-name> pair. Other BEINGs know
19800	about these accepting, guessing, checking activities.
19900		The  DEMONS  part says that during the partitioning, the only
20000	new demon to keep active is the one  called  Fringe_of_Consciousness.
20100	This   last  concept,  named  parodically  after  an  "impossibility"
20200	mentioned in [Dreyfus], is worth  exemplifying.   In  the  course  of
20300	writing  GI,  the  system  learns  that  the  main  task  is  one  of
20400	grammatical inference.  The Grammatical_Inference BEING then  asserts
20500	that  if  the  user ever mentions the word TEST, he may actually mean
20600	PARSE.   This is placed in the IDEN part of the TEST  BEING,  and  is
20700	therefore   only  demonized,  waiting  on  the  periphery  of  PUP6's
20800	concentration.   It is in fact triggered later in the dialogue, as an
20900	inference surprising to the user.
21000		4.4.  The dialogue to synthesize CF begins by PUP6 asking the
21100	user  for  any  task.   The user replies, ⊗4Write a program which does
21200	concept formation⊗*. There are many decisions that PUP6 knows about in
21300	writing  a  specialized  concept formation program, and it manages to
21400	defer them all.   For example, it must choose between classificatory,
21500	comparative,  and metrical concept formation. Since all three involve
21600	partitioning a domain of examples, PUP6 decides  it  can  defer  this
21700	choice  until  after code for the partitioning is synthesized.    The
21800	effort of the system is now  directed  toward  this  "piece"  of  the
21900	program.  When it is completed, an hour or two later, the system asks
22000	the user to make this decision. When he picks the first  alternative,
22100	which  involves ⊗4only⊗* partitioning,  the  system  prints that it has
22200	finished the entire CF task, and dumps the newly created BEINGs  onto
22300	a file.
22400		4.5. Earlier, the claim was made that the presence of popular
22500	AI language  features did not detract from the combinatorial behavior
22600	of the system, since BEINGs subsume each mechanaism used. This will
22700	now be sketched. Direct call by name may be viewed as a trivial type
22800	of pattern-directed call, 
22900	where each BEING can recognize its own name. See the IDEN part of the
23000	OBTAIN:USABLE:INFORMATION  BEING, for example.
23100		A demon
23200	in PUP6 is merely a function which gets executed periodically,
23300	and may occasionally trigger a BEING. This could  be  replaced  by  a
23400	BEING   whose   EXPLICIT:ARGS:CHECK   part  contains  the  triggering
23500	predicate, and whose META:CODE is simply a call by name  directly  to
23600	the  BEING  which  is  supposed to be activated.  
23700		An assertion can be
23800	viewed as a BEING containing  only  an  IDEN  part;  when  the  BEING
23900	recognizes a form which matches it, it returns the fully instantiated
24000	assertion. 
24100		A function is equivalent to a BEING with only a  META:CODE
24200	part;  one  knows  nothing about it before executing it.  
24300		Notice that
24400	the overheads saved by these mechanisms are substantial. For example,
24500	assertions replace an entire BEING call by a single CDR evaluation.
24600		The reader may have observed  the  static  quality  of  these
24700	examples,  and  wished  for  one of BEINGs in action, passing control
24800	back and forth in order to do something. But to present a  reasonable
24900	example   of  BEINGs  interacting,  it  is  necessary  to  understand
25000	something about the target  task.   Thus  we  describe  PUP6  in  the
25100	following  section,  including  how  the target task CF evolved. Then
25200	comes the dynamic example, in section 6.
     

00100	.NEXT PAGE
00200	.EVERY FOOTING(⊗5Fourth Draft .... {DATE},page {PAGE},THEORY:  DESIGN)
00300	⊗25. THEORY:  DESIGN⊗*
00400		A  highly  polished  presentation  of  a  large  system omits
00500	mention  of  the  design  and  implementation  mistakes.    This   is
00600	unfortunate;  many  decisions  which seem arbitrary are the result of
00700	painful re-working, and conversely.  The reader may  relax;  he  will
00800	find nothing polished about this treatment.
00900	
01000		Once  the  task  (automatic  program  synthesis from specific
01100	dialogue) was decided  upon,  considerable  attention  was  spent  on
01200	choosing  an  appropriate  domain  of  target  programs.  The choice,
01300	inductive inference  (II)  [Kant], merits  discussion.  The  obvious
01400	difficulty  appears  to  be  the complexity of the programs involved:
01500	they  are  two  orders  of  magnitude  larger  than  any   previously
01600	synthesized  programs. But there are four big attractions:
01700		(i)
01800	A  wide  range  of  complexity  exists,  from  a  one-page   sequence
01900	extrapolator   to   Meta-Dendral.
02000		(ii)   Each   increasingly
02100	sophisticated II program  uses  many  of  the  concepts  embodied  in
02200	simpler  II programs.
02300		(iii) If a super-human target program is
02400	ever written, it could itself contribute to the  field  of  Automatic
02500	Programming! (For those reading this in the 1970's, this is humorous.
02600	For those later readers, it may be commonplace.)
02700		(iv)  Since  we
02800	who  write  AI  (and  especially  AP)  programs  are the inductive
02900	inference professionals, our reliance on  introspection  is  as
03000	valid  --  and potentially as valuable -- as chemists' protocols were
03100	to Dendral.
03200		After  studying  many  sample  programs  from  the II domain,
03300	sequence extrapolation [Pilvar] seemed the most reasonable  beginning
03400	task. It was quickly learned that this was too easy: humans have only
03500	a few techniques for extrapolating  sequences,  and  a  very  limited
03600	capacity   for   composing   them.   Thus  a  rather  rigid  sequence
03700	extrapolator writer may be capable of generating  a  large  class  of
03800	super-human programs (see section 4.2 on
03900	Sequence-Extrapolator-Writer, in [Green]).
04000		The  next  candidates  were grammatical inference and concept
04100	formation.  Determined not to choose too simple  a  task  again,  the
04200	latter program was finally decided upon.  The particular target was a
04300	variant of [Winston]. To make the goal more tractable, certain  parts
04400	of   Winston's   description   were  ignored,  namely  the  heuristic
04500	graph-matching section, and the uniformity requirement that relations
04600	on features be functionally indistinguishable from features.
04700	This last phrase means that CF can store (MUSTNOT (SUPPORTS a b))
04800	differently from the representation of simple 
04900	features like (TOUCHES a c).
05000		It seems instructive now to describe how the  target  program
05100	should operate. It repeatedly scans a scene and tries to name it. The
05200	scene is broken into a set of features and a set of objects. Each 
05300	feature is  a  relation
05400	on  one  or  more  objects  in  the  scene.  Internally,  the program
05500	maintains a model for each differently-named 
05600	concept it has ever encountered.
05700	The  model  contains  a  description  of  the objects expected in the
05800	scene, a set of features which must be present in the scene (else  it
05900	can't  be the same as this concept), a set of features which must not
06000	be present in the scene, and a set of features which may or  may  not
06100	be  present.  Thus  a model is like an archtypical scene plus a name.
06200	Each time the program is confronted by a new scene, the program  must
06300	scan its models until it finds one which matches it. That is, all the
06400	model's MUST features are present, and all the MUSTNOT  features  are
06500	absent  from  the  observed scene. It informs the user of this guess,
06600	and accepts the proper concept name. If it  guessed  incorrectly,  it
06700	modifies its models. The wrong guess model may have features added to
06800	its MUST or MUSTNOT sets; the correct concept name model may have  to
06900	be  modified or (if it's a new concept) created and inserted into the
07000	list of models.  See the flowchart on page A4.7.
07100	.B
07200	
07300	For example, a scene might be:     OBJECTS a,b,c,d
07400		(GREEN a)(BLUE c)(TOUCHING c d)(SUPPORTS a c)(SUPPORTS b c).
07500	
07600	A model for an arch might be:      OBJECTS a,b,c
07700		MUST 	(SUPPORTS a c)(SUPPORTS b c)
07800		MUSTNOT (TOUCHES a b)
07900		MAY	(GREEN a)(WEDGE c)(PRISM a)(BLOCK b)(PARALLEL a b)
08000	.E
08100		Suppose  that the target program reads in the above scene and
08200	tries to match it to  the  above  model  for  consistency.  The  MUST
08300	relations  should  all  be  present.   Yes,  the  scene  does contain
08400	(SUPPORTS a c) and (SUPPORTS b c). Next, the MUSTNOT  relations  must
08500	be absent from the scene. Sure enough, (TOUCHES a b) isn't there.  So
08600	the model and scene are consistent.  If the user verifies this guess,
08700	then  the MAY set is augmented with (BLUE c) and (TOUCHING c d), and
08800	the OBJECTS set is augmented with "d."
08900	If the user denies that the scene is an arch, the target program
09000	sees if there are any relations in the model's MAY set which do not
09100	occur in the scene. If so, one of them (e.g., (PARALLEL a b)) will
09200	be transferred from the MAY to the MUST set.  If no such feature 
09300	existed, the program would look for a feature present in the scene
09400	but absent from all sets of the model (e.g., (BLUE c)), and insert
09500	it into the MUSTNOT set.  Also, if the user disagreed with the guess,
09600	he would be asked what the true name of the concept was, and that
09700	concept's model would have its MAY set augmented by any new 
09800	features in the scene. Any features on its MUST or MUSTNOT sets
09900	which contradicted the scene would be transferred to the MAY set.
10000		After the target concept formation program was specified,  it
10100	was trimmed and then rewritten as a structured program [Gadwa]. Next,
10200	a complete dialogue was  simulated  between  the  user  and  a  human
10300	programmer  (referred to as the system-player) playing the role of an
10400	"intelligent"  automatic  programming  system  (similar   to,   e.g.,
10500	[Balzer]).      The   system-player   kept   careful  records  as  he
10600	programmed, and tried to create a bug-free  structured  program.  The
10700	dialogue  was  then annotated:     after each user response, comments
10800	were inserted which described the "states" the system-player had gone
10900	through before printing his next response.  This included blind paths
11000	which were tried, use of outside world knowledge,  and,  in  general,
11100	was  meant  to  be  the "intelligence" necessary to do the task.  The
11200	fear was that a system could be built which synthesized  the  concept
11300	formation  program,  and  yet  was  so unintelligent that nothing was
11400	learned from it. (see section 4.1 on PW1, for example,  in  [Green]).
11500	Hopefully,  any  system  which  (i) got the target program correctly,
11600	(ii) followed our initial dialogue, and, most importantly, (iii) went
11700	through  the  same  line of reasoning that our comments indicated the
11800	system-player 
11900	followed, would be far along the road  toward  intelligence.  A
12000	corollary of this incremental annotated protocol approach is that the
12100	abilities of the user must coincide with those  of  the  subject  who
12200	participated  in the protocol (see also [Woods]). The system would be
12300	far along the road toward automatic programming if it also  (iv)  was
12400	able  to  write  CF  from several dialogues, from several users, with
12500	little preparation. PUP6 was not designed to do this, and in the  end
12600	it  proved to be a serious deficit.  
12700		The target user must be familiar
12800	with  LISP,  well-grounded  in  computer  science,   and   have   the
12900	input/output  behavior  of  the  concept  formation (CF) program very
13000	clearly in his mind. The natural language front  end  is  so  brittle
13100	that  the user must also phrase his responses very similarly to those
13200	of the original protocol subject.  This  factor  prevented  dialogues
13300	from multiple users from being successful.
13400	
13500		At  this  point,  most  of  the  ideas described in section 3
13600	surfaced, including a rough initial set of BEINGs.  Each one had  not
13700	much  more  than  a  name and a vague description of what it must do.
13800	The  dialogue  was  cycled  through  again,  painstakingly,  and  the
13900	comments  were  replaced  by a description of which BEINGs would call
14000	which other BEINGs, why, and the results  of  each  such  call.   The
14100	constraints   on   each  BEING  thus  grew,  sometimes  changed,  and
14200	occasionally a new BEING or  BEING  part  had  to  be  added  to  the
14300	community.  This  process  was  long  (200 hours) and produced a long
14400	(200-page) protocol, actually a hand trace of the  system  execution.
14500	About  eighty  BEINGs  were needed:  a dozen domain-specific ones and
14600	the rest  domain-independent  programming  knowledge.   About  thirty
14700	different  BEING  parts  were  called  for,  and  they  are listed in
14800	Appendix 1.  The next appendix describes each BEING briefly; only the
14900	most important knowledge is mentioned.
15000		A result of  this  method  of  incremental  specification  of
15100	BEINGs  is that each BEING has only those parts which are going to be
15200	used sometime during the ensuing dialogue. This seemed too  specific,
15300	so  some  effort  was  spent  filling out parts that weren't strictly
15400	necessary to write the  concept  formation  program.    Perhaps  more
15500	careful  attention  to  this activity would have made the system more
15600	tolerant of changes in the dialogue. During the course  of  massaging
15700	PUP6  into  writing  the  other  target programs, very few additional
15800	parts had to be added to existing BEINGs. Typically,  either  an  old
15900	part had to be generalized or else a new BEING created. After writing
16000	CF, GI, and INF, there are now an even hundred BEINGs in PUP6.
16100		The data on how filled out each BEING was -- and had to be --
16200	is presented in several  forms,  here  and  in  the  appendices.   In
16300	Appendix 1, next to each BEING part is the percentage of BEINGs which
16400	had to have this part specified for them. Appendix 3 reveals how each
16500	BEING  was  actually  used,  including  the number of its BEING parts
16600	which exist and were called upon during a dialogue. On  the  average,
16700	each  BEING  part  was present in 36% of all BEINGs, and, also on the
16800	average, each BEING had 10.5 of its 29 parts specified. This says 
16900	that each BEING was actually asked a third of all possible questions
17000	and that each question was needed in about a third of all the BEINGs.
17100		The  principle that "everything is BEINGs" was relaxed in the
17200	interests of absolute CPU time and feasibility  of  coding.   Besides
17300	BEINGs,  PUP6 employs functions, demons and an assertional data base.
17400	During  the  programming,  100  small  functions  were   written   to
17500	supplement  the  language.   Most  were  typical  current AI language
17600	features [Bobrow] (pattern-matching, demons, special data types),  or
17700	tiny additions to INTERLISP [Teitelman] (flatten, set-difference), or
17800	functions      which      manage       BEINGs       (add-a-new-being,
17900	print-a-being's-parts).   Many of these features were simplified (and
18000	speeded up) by using the knowledge that PUP6 was to be an AP  system.
18100	For  example, all the different types of assertions that an AP system
18200	might want to make deal with different concerns of programming.  Thus
18300	a  group  of about forty different assertion patterns could be fixed,
18400	each one becoming a named  list;  this  almost  eliminated  searching
18500	during retrieval from the assertional base.
18600		The initial  programming  took  only  a  hundred  hours,  but
18700	several  times that amount of work was required before the system was
18800	debugged  to  the  point  of  successfully  following  the  annotated
18900	dialogue.
     

00100	.NEXT PAGE
00200	.EVERY FOOTING(⊗5Fourth Draft .... {DATE},page {PAGE},REALITY:  EXAMPLES)
00300	⊗26.  REALITY:  EXAMPLES⊗*
00400	
00500		An example of the PUP6 community of BEINGs interacting will now
00600	be presented. Let's jump one third of the way into the dialogue which
00700	writes  CF. The system learns it must compare the input scene against
00800	its internally stored models of concepts, until it  finds  one  which
00900	doesn't  fail  the  comparison.   It  asks the user what it means for
01000	scene S to fail the comparison to model M. The user  replies,
01100	.NOFILL
01200	⊗4M  is  incompatible  with  the  observed  features  of  S⊗*.
01300	{FILL} The IDEN part of the
01400	CONTRADICTS BEING recognizes this sentence pattern, and transforms it
01500	to 
01600	.NOFILL
01700	(FORSOME  F  IN M_FEATURES (specialized:CONTRADICTS F S_FEATURES)).
01800	{NOFILL} The BEING
01900	which inserts this into the synthesized code requires that  the  user
02000	pick  some  proper  name  for  the function specialized:CONTRADICTS;
02100	this will be a streamlined contradiction test written by the
02200	CONTRADICTS BEING.  Say the user reponds by calling it
02300	#. The function FORSOME is so primitive that no  specialized
02400	version  of  it  is  written normally. 
02500		The system informs the user of
02600	where it is by simulating a cursor pointing into a graph  of  program
02700	code.  This  is  analogous  to  the textual and dynamic indices which
02800	[Dijkstra] suggests. 
02900		Later in the  dialogue,  PUP6  decides  it  must
03000	expand the function #. The CONTRADICTS BEING is again called on; this
03100	time  it  is  asked  how  to  write  a  specialized  version   of   a
03200	contradiction  test.  It  replies that SOME_OF the following types of
03300	contradiction  may  occur:        PROBABILITY:0,  PROBABILITY:1,  and
03400	PROBABILITY:ε(0,1).    This   is   the  germ  of  the  idea  for  the
03500	MUSTNOT/MUST/MAY categorization of features. The SOME_OF BEING  takes
03600	control,  and  asks  if  the  decision can be deferred.  The DEFERRAL
03700	BEING looks about, first asking if there is  any  non-null  piece  of
03800	code  that  all three have in common.  This fails, and eventually the
03900	DEFERRAL BEING reports failure.  SOME_OF sees it has  no  basis  upon
04000	which  to  form  a  guess,  and  wants somebody to ask the user.  The
04100	ASK_USER BEING takes over, and quickly finds out what it is that PUP6
04200	wants  to  learn.  The names of the three choices are so cryptic that
04300	their WHAT and HOW parts are printed out to  the  user,  as  well  as
04400	their names.  The user types back his choices, in our case all three.
04500	SOME_OF  composes  three  new  function   calls,   separated   by   a
04600	conditional:
04700	.B
04800	
04900	(COND ( (IS_OF_TYPE         F   PROBABILITY:0_PART_OF_M_FEATURES)
05000		(PROBABILITY:0_#    F    S_FEATURES))
05100	      ( (IS_OF_TYPE         F   PROBABILITY:1_PART_OF_M_FEATURES)
05200		(PROBABILITY:1_#    F    S_FEATURES))
05300	      ( (IS_OF_TYPE         F   PROBABILITYε(0,1)_PART_OF_M_FEATURES)
05400		(PROBABILITYε(0,1)_#  F  S_FEATURES)))
05500	.E
05600	TRICHOTOMY recognizes this as a split on a function's  values
05700	being  =0,  =1,  or  between  zero  and  one.   It  asks whether this
05800	particular  function  can  only  range  over  the   interval   [0,1].
05900	PROBABILITY  answers  affirmatively,  so  SOME_OF  replaces the final
06000	IS_OF_TYPE test by the constant T.  Later on, PUP6 must  worry  about
06100	the  other  two  tests, and about the three contradiction predicates.
06200	These latter entities know (their ENCODABLE  parts  tell  them)  that
06300	they  are  immediately encodable. A probability=0 contradiction means
06400	that arg1 must not occur in the  world  arg2  yet  it  does.  So  the
06500	META-CODE  of  the  PROBABILITY:0_CONTRADICTION  BEING  is defined as
06600	(MEMBER arg1 arg2).  This corresponds to a MUSTNOT feature F which is
06700	present  in  the  world  (in  the  set of S_FEATURES of the scene.) A
06800	probability=1 contradiction occurs when a feature arg1 must occur  in
06900	the  world  arg2,  yet  it  doesn't. So its definition is simply (NOT
07000	(MEMBER arg1 arg2)).  It is impossible for a feature with probability
07100	in  (0,1)  to  be  in  contradiction  with any world (lacking negated
07200	features), so this third predicate is defined as  the  constant  NIL.
07300	That  is,  if F is a MAY feature of the model M, then there can be no
07400	contradiction between F and ⊗4any⊗* features of the scene S.
07500		When a BEING is said to do or recognize or know something, as
07600	in the preceding and following paragraphs, what is actually meant  is
07700	that  one or more of its parts specificly encode that knowledge.  All
07800	the parts of the hundred BEINGs in PUP6 were coded by  hand,  not  by
07900	each  other.  They then can encode other BEINGs, interacting only via
08000	the dialogue.
08100		The  IS_OF_TYPE  BEING  recognizes  that it must work hard to
08200	replace each of the two case tests, unless  M_FEATURES  is  organized
08300	permanently  into  three  parts (thus permitting the complex function
08400	"IS_OF_TYPE" to be replaced by the simple one "MEMBER" in  the  above
08500	piece of code). The STRUCTURE_INDUCING BEING will take over, to probe
08600	the permissability and the difficulty of such a constraint.  It finds
08700	nothing  about  this  tripartite  structuring  which could damage any
08800	earlier synthesized code, and asks the  user's  blessing.  Notes  are
08900	added  to  the list of coding warnings, stating that any reference to
09000	the entire list of M_FEATURES must be replaced by  either  APPEND  of
09100	the  three  new lists, or else by three separate statements. GET_NAME
09200	is indirectly called, and he has the user name the three new sets  of
09300	features; say he responds by calling them MUSTNOT, MUST, and MAY. The
09400	system would at this point say to draw an arrow on its graph of code,
09500	from the function call (# F S_FEATURES) to the new block of code:
09600	.B
09700	
09800	  (COND ( (MEMBER        F   MUSTNOT_PART_OF_M)
09900		  (MEMBER        F   S_FEATURES))
10000	        ( (MEMBER        F   MUST_PART_OF_M)
10100		  (NOT (MEMBER   F   S_FEATURES))
10200	        (  T (COMMENT THIS "T" REPLACES "MEMBER F MAY_PART_OF_M")
10300		   NIL))
10400	.E
10500		This is now the META:CODE part of the new BEING called #.
10600	Most of the other parts are taken from its generalization, namely
10700	CONTRADICTS. During the course of writing this piece, however, some
10800	of these parts would be slightly changed. For example, its reason
10900	for existing would be strengthened when the simple MEMBER function
11000	calls replaced the slow IS_OF_TYPE  BEING calls.
11100	The actual transcript of this part of the dialogue imay  be  seen  on
11200	pages  A5.16  to  A5.20.  Notice that only a few messages have passed
11300	back and forth during all this processing.  The user has the  feeling
11400	of  merely  directing, constraining, and watching the activities of a
11500	busy programmer.   Unfortunately, "the user" is  not  generic;  there
11600	was  only one successful user. As was mentioned earlier, PUP6 insists
11700	on doing structured programming,  so  its  traces  are  superficially
11800	similar to  macro  expansion. Despite this concentration on planning,
11900	no BEINGs system 
12000	should mistakenly halt so long as any details remain.
     

00100	.NEXT PAGE
00200	.EVERY FOOTING(⊗5Fourth Draft .... {DATE},page {PAGE},REALITY:  RESULTS)
00300	⊗27. REALITY:  RESULTS⊗*
00400		The  character of the dialogue (described in Section 6 and in
00500	Appendix 5) is, of course, one important measure of the  intelligence
00600	of  an  AP  system.   One which asked "What is the first instruction?
00700	What is the second...?" would be universal but worse than useless. In
00800	this  section  are  some  proposed  questions  one  should  ask  when
00900	confronted by a new AP system and some measures of performance of any
01000	AP system which generates code heuristically from a dialogue.
01100		The questions break into two categories. First, one wants  to
01200	know  what the system does.  Most important, does it write a program?
01300	If  so,  how  complex  is  that  task,  and  what  is   the   program
01400	specification  like?     How precise must the user be:    may he err,
01500	forget, change his mind?  How detailed must his  dialogue  be?    How
01600	closely  does  the  dialogue determine the details of the final code?
01700	How does the user know where he is during the dialogue?
01800		Quite  seriously, an important question is whether the system
01900	writes a second program.  If so, how does it relate to the first  one
02000	synthesized?    How much of the AP system is actually used in writing
02100	⊗4both⊗* programs?  One should ask what the full range of programs is
02200	which  the system has successfully synthesized. The dual questions to
02300	these inquire whether a program may be generated by several different
02400	dialogues,  and  what  the  range  of variability is. Similarly, what
02500	about the target language: is it a parameter? how does it  relate  to
02600	the language the system is written in?
02700		Does the system modify as well as write code?     If so,  can
02800	it  modify  anybody's,  or only those programs it has written itself?
02900	What is its "style" of programming? One also wishes to know the  size
03000	of  the tool as well as the size of the job: How does the system size
03100	compare to target program size?
03200		Efficiency considerations are often the crucial
03300	pragmatic ones. Economics and current hardware restrict the amount of
03400	computation  which  may be done in toto; the expected lifetime of the
03500	universe restricts us from using the  most  elegant  algorithms;  and
03600	human  patience  restricts the interaction delay time (to a minute or
03700	two of real time). One must therefore  ask  things  like
03800	how  much  time  and  space  it  requires,  how long a delay there is
03900	between user inputs, etc.
04000		The  second  class of questions concerns the theoretical side
04100	of the AP system.  What new ideas is it meant to experiment with? How
04200	do each of these ideas fare?  Does it write its code in the right way
04300	(does it ask the right questions of  the  user  at  just  the  proper
04400	times)?   How extendable is it: how difficult is it to add new pieces
04500	of knowledge and to modify old ones?   Could  it  write  programs  in
04600	various fields, in various styles, in various sizes?
04700		How is knowledge  represented  internally?    Is  any  of  it
04800	duplicated? If so, what and why? Why doesn't this system solve the AP
04900	problem?
05000		PUP6's  answers  to most of the these questions are scattered
05100	throughout the earlier sections of this paper.  Below are its answers
05200	to the remaining questions.  Although BEINGs can theoretically handle
05300	user errors, PUP6 was set up expecting that the user would never  err
05400	or  forget.  He could, after the program was finished, try it out and
05500	describe how he wished it changed. Occasionally, PUP6  actually  make
05600	the  right modifications. For example, INF,
05700	the simple data storage and retrieval
05800	system which was written, was supposed to store and retrieve airline
05900	reservations.   After the synthesis is finished, the user tries
06000	out the program and finds that he is unable to delete more  than  one
06100	reservation  with  any  single  command. He complains about this, and
06200	PUP6 modifies the program so that he may specify a pattern,  and  all
06300	reservations  matching  that  pattern  will  be  purged.   While 
06400	assumptions of absolute 
06500	user reliability are reasonable for  little  programs,
06600	they  break  down  when the tasks reach the size of tens of pages and
06700	hours.
06800		Let  us  briefly  describe  GI  and  INF, the second and
06900	third programs PUP6 synthesized.   GI  is  a  simple
07000	grammatical inference program. It builds up a set a plausible  rules,
07100	rather  than  enumerating  through  the space of all grammars. Yet it
07200	lacks most of the "tricks" of  the  best  g.i.  programs.   The  user
07300	repeatedly specifies a string, labelled LEGAL, ILLEGAL, or ??. In the
07400	latter case, GI must print out its guess and accept the correct label
07500	from  the  user.   In  all  three  cases,  it  must update its set of
07600	plausible rules to be at  least  consistent  with  all  positive  and
07700	negative  instances  observed. In some versions of GI the user coaxes
07800	PUP6 to generate, a parse tree may be maintained  and  inspected;  in
07900	other versions, just a list of the rules used during a parse is kept.
08000		INF is a data-retrieval-on-keys system.
08100	The user repeatedly types
08200	in a request to insert, delete, or inspect a  record  (e.g.,  INSERT:
08300	PASSENGER  Jones  FLIGHT  741  FROM SFO TO LAX CARRIER TWA LEAVE 7:15
08400	ARRIVE 8:00).  Any unspecified fields are treated  as  dont't  cares;
08500	thus the request is matched against entries in the data base.
08600		The table below shows how each type of knowledge was used  in
08700	writing the three target programs.  All numbers refer to BEINGs.
08800	
08900	.BEGIN
09000	.GROUP
09100	.NOFILL
09200	.FONT 7 "FIX20"
09300	.SELECT 7
09400	.BREAK
09500	
09600	.ONCE CENTER
09700	                ⊗4U S E D   I N   S Y N T H E S I Z I N G⊗*         
09800	
09900	TYPE OF        CF    CF    CF    CF     GI    INF     not     Crea    Crea    Crea    Total          
10000	KNOWLEDGE      GI    GI   INF   only   only   only    used    -ted    -ted    -ted    BEINGs
10100	              INF   only   only                       ever    in CF   in GI   in INF
10200	
10300	Programming    39     7     5     5     3      1       14       52      27      21      174
10400	Domain          0     3     0     9     8      1        5        4      10       3       43
10500	Total          39    10     5    14    11      2       19       56      37      24      217
10600	.END
10700	
10800	The high percentage of programming BEINGs  which  were  used  in  all
10900	three  dialogues  is exciting.  The three domain-specific BEINGs used
11000	in both CF and GI  sessions  indicate  that  they  may  be  Inductive
11100	Inference  domain specific.  They aren't used in INF, which is not an
11200	inductive inference task. All three deal with partitioning a domain.
11300		A  specialization of a general programming BEING is listed as
11400	a programming BEING still  (in  the  CREATED  columns  of  the  above
11500	table.)  This  is  because  it remains sufficiently independent to be
11600	used in many ways, conceivably in many different programs.
11700		The  style  of  the synthesized programs is constrained since
11800	all code must be BEINGs. At a lower level,  there  are  many  trivial
11900	inefficiencies which are passed through a BEING which is to optimize
12000	LISP programs out of context, at a low level.  This BEING is
12100	vacuous now, but may soon contain a LISP optimizer being  written  by
12200	A.  Cohn of the SAIL AP group. Low level efficiency was intentionally
12300	ignored to expedite work  on  PUP6.   Nevertheless,  the  final  code
12400	produced  runs  in  about  the  same  time as the hand-coded versions
12500	(e.g., a few seconds per scene for CF).  It is several times as long,
12600	though,  since  it  must  be able to answer questions about what it's
12700	doing as it runs.  The  protocol  version  lacked  this  ability,  of
12800	course.
12900		One  crude  measure  of  concentration  of intelligence is to
13000	compare the system  and  target  code  lengths.  Ephemeral  numerical
13100	efficiency  data  such  as  this follows. PUP6 is 200 pages long when
13200	PRETTY-PRINTED, and has synthesized three programs, 7, 20, and 30
13300	pages long (1, 4, and 6 pages, when coded by hand.)
13400		A  brute  force  AP  system  would  require  a  time  roughly
13500	exponential  in  target  length, so log(time)/target_length (measured
13600	over several different programs and over several AP  systems)  is  an
13700	indicator  of  the  intelligence of an AP system.  For a good system,
13800	this number should (i) be small absolutely,  and,  more  importantly,
13900	(ii)  decrease  significantly as the target program length increases.
14000	For a ⊗4very⊗* good program writer, the  quantity  time/target_length
14100	stays   almost  constant.   This  is  not  quite  achieved  by  human
14200	programmers known to the author.
14300		PUP6  takes  60  cpu minutes to produce CF. During this time,
14400	50000 characters get typed out to the user, who  reponds  with  about
14500	2500  himself. The dialogue lengths are best specified in characters,
14600	since often the user will supply simply a number or  a  letter  or  a
14700	YES/NO  reply.   Dividing  these by a hundred, one can relate them to
14800	target and system lengths in lines. Even  the  character  counts  are
14900	very  rough,  because  the  format  of  the AP dialogue can easily be
15000	varied by a couple orders of magnitude.  The mean wait time  (between
15100	the user's input and the next machine output) is several seconds. The
15200	longest delay  between  successive  user  inputs  is  1  CPU  minute.
15300	Unfortunately,  PUP6  requires 100K to run in, which (with INTERLISP)
15400	exhausts the virtual memory of the current hardware being  used.  One
15500	would lose vast amounts of time by using intermediate storage devices
15600	or by running consecutive communicating jobs.
15700		INF  was  one fifth the size of CF, and took about a seventh
15800	as long to generate. The dialogue was shorter by a factor of three.
15900		Most  of  the  theoretical questions are answered by the very
16000	design of the system. Its ideational foundation has  been  discussed.
16100	When  the  user  sticks close to the original protocol, PUP6 asks the
16200	right questions at the right times because of its genesis  from  that
16300	annotated  protocol.   The  second  and third programs were attempted
16400	only to test its flexibility, and the results were mixed.
16500		First  the bright side. The increment to PUP6 to enable it to
16600	write the GI and the INF programs is encouragingly small. Almost  all
16700	the  programming-knowledge BEINGs are called in writing more than one
16800	of the programs. It is now clear  that  very  domain-specific  BEINGs
16900	were in control at the early, high levels. Later, more general BEINGs
17000	took over, followed by pure programming BEINGs.  The middle  category
17100	would  include II BEINGs like PARTITION_A_DOMAIN.  
17200		Regrettably, incrementing the system with new domain-specific
17300	BEINGs is not feasable for the average user.  To add a BEING, he must
17400	specify  all  of  its relevant parts. Each is inputted either as LISP
17500	code, as an English sentence PUP6  is  capable  of  interpreting,  an
17600	English  sentence  PUP6  is  just  supposed  to  remember as a canned
17700	response, or by pointing to a part of an existing  BEING  and  saying
17800	"like  that  one,  except  ...", where the preceding ellipsis must be
17900	very simple.  The interactions between the BEINGs, and  the  existing
18000	set of BEINGs, need not be fully understood; but the set of allowable
18100	assertion templates, the mechanisms by which each part is  used,  the
18200	special  data types involved (VECTOR, TUPLE), and the precise actions
18300	of each new part ⊗4must⊗* be known in order to safely  inject  a  new
18400	BEING.
     

00100	.NEXT PAGE
00200	.EVERY FOOTING(⊗5Fourth Draft .... {DATE},page {PAGE},THEORY:  CONCLUSIONS)
00300	⊗28. THEORY:  CONCLUSIONS⊗*
00400		The strengths and weaknesses of both BEINGs and PUP6 are
00500	reviewed.
00600	This experiment has helped clarify some 
00700	of the potential problems facing
00800	future AP work.
00900		While the number of  BEING-parts  is
01000	unimportant,  its magnitude (a few tens) may have significance to AP.
01100	The fact that is isn't ~3 explains the  difficulty  that  ACTORS  and
01200	cyberneticists  have;  the fact that it isn't ~1000 means that the AP
01300	problem is tractible.
01400		Some of the ideas discussed at the  beginning  of  the  paper
01500	have proven themselves to be useful in getting PUP6 to work. 
01600	Little programs
01700	can indeed be treated as if their essence  is  nothing  more  than  a
01800	collection  of  answers.   The  gain  from  standardization  is 
01900	conceptually easy
02000	addition of pieces to the system; one "merely" adds new BEINGs to the
02100	community. Their interactions with all the existing BEINGs needn't be
02200	known in advance. As was discussed in the previous section, the 
02300	actual addition is beyond the power of the untrained user.
02400		Indications   are  that  one  ⊗4can⊗*  locate  relevant
02500	information by organizing the knowledge in a pool,  with  each  piece
02600	(i)  responsible  for  recognizing  when  it  is  relevant  and  (ii)
02700	indicating new relevant information indirectly  via  nondeterministic
02800	pattern-directed  retrievals  and  assertions.  Notice that this puts
02900	all control structure into the hands  of  the  BEINGs;  there  is  no
03000	central  monitor  in  PUP6.   A  BEING's answers may at various times
03100	indicate that it should be chosen to be in control, and it will  then
03200	take  over.   Notice  also  that this relevancy problem is central to
03300	program maintenance as well as synthesis. It is not clear whether 
03400	or not this really beats the exponential growth of this problem.
03500		The  fact  that target code is in the format of BEINGs limits
03600	the size of the target programs (≥ one page) and their efficiency  (a
03700	BEING-call  is a very slow and intricate process) and of course their
03800	style. The most startling result -- which should have been forseen --
03900	was that "intelligent" target code is synthesized. This was mentioned
04000	casually in the IDEAS section, but its importance is clear: the  code
04100	is  self-commenting  in the strong sense that the code can answer any
04200	(of our set of 29) questions about itself.  Those which make sense at
04300	compile-time  can  be  asked  then;  those  which  make  sense during
04400	execution may be asked then.  For example, CF begins  running.  After
04500	several  scenes have been processed, CF is waiting for a new one.  If
04600	the user interrupts  and  asks  what  it  is  doing,  CF  will  reply
04700	"awaiting user input of a brand new scene." One can ask why, how, who
04800	will be affected, and so on, interrupt as frequently  as  desired  --
04900	even  after  each  BEING  transfers control. The actual questions and
05000	responses are not very impressive, but they are at least at the  same
05100	level as those which can be gotten from PUP6 itself as it runs.
05200		The set of questions the user was expected to want to ask the
05300	system  is  similar  to the questions one BEING wants to ask another.
05400	So when the "nice" user interrupts, his questions are almost always a
05500	simple  retrieval  from a property list (a GETP or a composition like
05600	EVALπ.GETP). When the average  user  interrupts,  his  questions  are
05700	often unintelligible to PUP6.
05800		Meaningful use of comments proved helpful. As an example, the
05900	comment at the end of the main outer loop was "COMMENT, INFINITE LOOP
06000	IN THIS PROG" (see page A5.10) for most of the session. Near the  end
06100	of  the  session,  the  CLARIFY_IMPROBABLE_SITUATION BEING recognizes
06200	this as a warning, works on introducing a breakaway  test,  and  then
06300	replaces  the  comment by one indicating that no infinite loop exists
06400	there anymore (see  page  A4.4).   The  advantage  to  embedding  our
06500	insertions in the code this way is that, at any stage, the user could
06600	inspect the code and get something meaningful  out  of  the  comments
06700	which were present.
06800		The  careful  bookkeeping   actually   did   eliminate   some
06900	carelessness  errors, though it had no effect on user errors or later
07000	program maintenance  directives.  These  latter  processes  are  less
07100	ill-defined  than  blind debugging, and in fact are about the same as
07200	programming itself [Dijkstra].  The deferral  of  decisions  has  the
07300	nice corollary of reducing (though not minimizing) communication with
07400	the slow user channel.
07500		Synthesizing a new, 
07600	clean target program  probably  would  require
07700	adding  a  few domain-independent BEINGs and several domain-specific
07800	BEINGs, and generalizing several parts of  several  existing  BEINGs.
07900	Since  the  dialogues  for  GI  and INF were not studied before-hand,
08000	their acceptability  would  have  demonstrated  the  success  of  the
08100	system.  Most of the existing BEINGs were used in generating these
08200	new programs, but a few new BEINGs had to be added. Equally
08300	discouraging, the dialogue to write INF is too long and detailed
08400	for the task at hand.  It also  seems  that  the  front  end  is  too
08500	brittle  to  allow  anyone  unfamiliar with the system to easily work
08600	with it.
08700		The   need   for  natural  language  flexibility  stems  only
08800	partially from inter-user  differences.   A  serious  and  unexpected
08900	source  of  conflict  here  is the amount of inference the user wants
09000	done.  This level is relatively subjective, and may fluctuate rapidly
09100	even  with  one  user  writing one program. Perhaps there should be a
09200	dial he can set. At one extreme, the system would  ask  the  user  to
09300	verify  even  the  most  trivial  inductions.  At  the  other extreme
09400	setting, the system would probably write  the  target  program  after
09500	just a few brief user inputs. The user would then try out one program
09600	after another, interjecting just one or two comments each time, after
09700	which  the  system would come up with the next version.  This program
09800	modification mode seems appropriate for someone ignorant of LISP, who
09900	nevertheless has the I/O behavior of his target clearly in mind.
10000		Some of the BEING parts  proved  embarrassingly  unnecessary.
10100	The  CO-REQUISITES  part was never used.  The only activity which had
10200	to be done concurrently was demon activation. The AFFECTS part was of
10300	interest  to  the  user  occasionally,  but  never to any BEING.  The
10400	EFFECTS part originally had a twin, describing what would  happen  if
10500	each  BEING  were  ⊗4not⊗*  called right now.  In each case, this was
10600	merely a trivial restatement of the frame problem.  So,  like  STRIPS
10700	[Fikes],  PUP6  ignores  the  frame  problem:     all changes must be
10800	explicit.
10900		Much  of PUP6's comments are boring and unnecessary; this was
11000	felt to be an engineering problem which would be ignored in all but a
11100	"marketable"  AP system. This view was probably incorrect. One of the
11200	most severe AP problems is having  the  system  inform  the  user  of
11300	precisely  what  he should know -- no more nor less.  This requires a
11400	sophisticated  user  model  cleverly  interfaced   to   the   current
11500	programming task.
11600		Another problem with large, long dialogues is user  error.  A
11700	human  has  great  difficulty keeping "on top" of everything.  He may
11800	get lost, forget, change his  mind,  or  misunderstand.  Again,  this
11900	problem  was originally considered ignorable, but has shown itself to
12000	be  a  limiting  practical  factor  in  wide  accessability.  It  was
12100	necessary  to  create  a  forgetful-user  demon  to prevent anaphoric
12200	reference to something which, though uniquely determined, hadn't been
12300	mentioned in an hour! Related to this is the problem of keeping
12400	the user informed of where he is. PUP6 simulated a continuous display
12500	of  the graph of the current partial program, augmented by static and
12600	dynamic cursors. This  use  of  dynamic  and  textual  indices  seems
12700	insufficient.   The  user was still often confused, wishing a section
12800	referred to not by pointing but rather by a brief, meaningful (to him
12900	only!)  phrase.   This may also be one of the major AP problems which
13000	must be studied further.
13100		These  worries  about large dialogues arise because future AP
13200	systems are viewed as tools for writing programs which humans  simply
13300	cannot  manage  currently:      viable natural language systems, huge
13400	operating systems, better AP systems.
13500		McCune  calls attention to an argument against ever needing a
13600	system whose target programs will be longer and more complex than the
13700	system  itself.  First,  empirically, optimizing compilers are viable
13800	and yet they dwarf almost all the code they generate.  Second, as the
13900	complexity  of the transformation increases, the length of a compiler
14000	increases rapidly.  For a truly intelligent AP system, one  might  be
14100	willing  to  tolerate a program several times as large as anything it
14200	could produce.
14300		An  argument  exists  to  counter this one.  First, one might
14400	object to drawing  an  analogy  from  compilers;  many  entities  are
14500	capable  of  producing  things  more  sophistocated  than themselves,
14600	larger than themselves, etc. (consider evolution). Second,  it  seems
14700	that there is a manageable body of information which one needs master
14800	to do programming [Informal].  Viewed differently [Dijkstra], one can
14900	write  programs  as long as he wishes if the program is designed in a
15000	clean way. Thus the size of AP systems will probably
15100	level  off  (albeit  huge
15200	compared  to  existing  programs)  even as the size and complexity of
15300	their generated code continues to increase and,  eventually,  surpass
15400	them.  Finally,  we  must  consider why we want a good AP system:  to
15500	help  us  write  large  programs  easily,  programs  which  might  be
15600	unmanageable  today.   For  this  reason alone, our AP system must be
15700	able to deal with programs larger than itself.
15800		The whole design of BEINGs  is
15900	oriented   toward  this  large-scale  end.  One  cannot  write  tiny,
16000	super-efficient programs using BEINGs, any more  naturally  than  one
16100	can  generate  efficient machine code using a high level language. 
16200	A BEINGs system might simulate this activity, but it would be as
16300	awkward and opaque as if they were asked to simulate volleyball. By
16400	relinquishing  more  and  more  control  to  the  computer   language
16500	environment,  the  programmer  sacrifices specialization of the final
16600	code for ease of  constructing  it  and  for  its  greater  size  and
16700	complexity.
     

00100	.NEXT PAGE
00200	.EVERY FOOTING(⊗5Fourth Draft .... {DATE},page {PAGE})
00300	⊗29. ACKNOWLEDGEMENTS⊗*
00400	
00500		There  are,  of  course,  countless  ideas  embodied  in  any
00600	concrete project.  Sweeping philosophical assumptions are made simply
00700	in   trying   to   do   Automatic   Programming    [McCarthy].    Any
00800	Program-Understanding  Program  should  include the best parts of all
00900	the best old ideas.  PUP6 relies  on  concepts  gleaned  from  Actors
01000	[Hewitt],   Demons   [Charniak],   heterarchy   [Reddy],   structured
01100	programming [Dijkstra], assertional  data  bases  and  flexible  data
01200	types  [Reboh],  pattern-directed  invocation of procedural knowledge
01300	[Winograd], the paradigm of dialogue [Floyd], and studies on  program
01400	specification techniques [Green].  Of course this list is incomplete;
01500	sophisticated ideas are captured in the languages themselves:   QLISP
01600	[Reboh],   INTERLISP   [Teitelman],  LISP  [McCarthy2],  and  English
01700	[Anonymous]. To this rich store, one may achieve new  heights  merely
01800	by synergy.
01900		Any success of the BEINGs project, as well as  the  precursor
02000	PUP  programs  [Green]  is due in large measure to the 
02100	creative energy of C.  Cordell  Green.   I  have
02200	also  drawn  upon  discussions  with
02300	D. Barstow, B.  McCune,  D.   Shaw,  E.
02400	Sacerdoti, L.
02500	Steinberg,  and  R.  Waldinger.   The generosity of SRI, in providing
02600	computer time and language consultations, should not go unmentioned.
02700	Also valuable were the critical readings of this paper by R. Davis
02800	and T. Winograd.